Empirical Results on Convergence and Exploration in Approximate Policy Iteration

نویسندگان

  • Niket S. Kaisare
  • Jong Min Lee
  • Jay H. Lee
چکیده

In this paper, we empirically investigate the convergence properties of policy iteration applied to the optimal control of systems with continuous state and action spaces. We demonstrate that policy iteration requires lesser iterations than value iteration to converge, but requires more function evaluations to generate cost-to-go approximations in the policy evaluation step. Two different alternatives to policy evaluation, based on iteration over simulated states and simulation of improved policies are presented. We then demonstrate that the λ-policy iteration method, with λ ∈ [0, 1], is a tradeoff between value and policy iteration. Finally, the issue of exploration to expand the coverage of the state space during offline iteration is also considered. Copyright c ©2005 IFAC

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Some New Existence, Uniqueness and Convergence Results for Fractional Volterra-Fredholm Integro-Differential Equations

This paper demonstrates a study on some significant latest innovations in the approximated techniques to find the approximate solutions of Caputo fractional Volterra-Fredholm integro-differential equations. To this aim, the study uses the modified Adomian decomposition method (MADM) and the modified variational iteration method (MVIM). A wider applicability of these techniques are based on thei...

متن کامل

Pareto-optimal Solutions for Multi-objective Optimal Control Problems using Hybrid IWO/PSO Algorithm

Heuristic optimization provides a robust and efficient approach for extracting approximate solutions of multi-objective problems because of their capability to evolve a set of non-dominated solutions distributed along the Pareto frontier. The convergence rate and suitable diversity of solutions are of great importance for multi-objective evolutionary algorithms. The focu...

متن کامل

Approximate Policy Iteration: A Survey and Some New Methods

We consider the classical policy iteration method of dynamic programming (DP), where approximations and simulation are used to deal with the curse of dimensionality. We survey a number of issues: convergence and rate of convergence of approximate policy evaluation methods, singularity and susceptibility to simulation noise of policy evaluation, exploration issues, constrained and enhanced polic...

متن کامل

Solving ‎‎‎Multi-objective Optimal Control Problems of chemical ‎processes ‎using ‎Hybrid ‎Evolutionary ‎Algorithm

Evolutionary algorithms have been recognized to be suitable for extracting approximate solutions of multi-objective problems because of their capability to evolve a set of non-dominated solutions distributed along the Pareto frontier‎. ‎This paper applies an evolutionary optimization scheme‎, ‎inspired by Multi-objective Invasive Weed Optimization (MOIWO) and Non-dominated Sorting (NS) strategi...

متن کامل

Error Bounds for Approximate Policy Iteration

In Dynamic Programming, convergence of algorithms such as Value Iteration or Policy Iteration results -in discounted problemsfrom a contraction property of the back-up operator, guaranteeing convergence to its fixedpoint. When approximation is considered, known results in Approximate Policy Iteration provide bounds on the closeness to optimality of the approximate value function obtained by suc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005